Snowball: strain aware gene assembly of metagenomes
نویسندگان
چکیده
MOTIVATION Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. RESULTS We have developed Snowball, a novel strain aware gene assembler for shotgun metagenomic data that does not require closely related reference genomes to be available. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of individual gene domains based on read overlaps and error correction using read quality scores at the same time, which results in very low per-base error rates. AVAILABILITY AND IMPLEMENTATION The software runs on a user-defined number of processor cores in parallel, runs on a standard laptop and is available under the GPL 3.0 license for installation under Linux or OS X at https://github.com/hzi-bifo/snowball CONTACT [email protected],[email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Species Divergence vs. Functional Convergence Characterizes Crude Oil Microbial Community Assembly
Oil reservoirs exhibit extreme environmental conditions such as high salinity and high temperature. Insights into microbial community assemblages in oil reservoirs and their functional potentials are important for understanding biogeochemical cycles in the subterranean biosphere. In this study, we performed shotgun metagenomic sequencing of crude oil samples from two geographically distant oil ...
متن کاملDe novo extraction of microbial strains from metagenomes reveals intra-species niche partitioning
Background We introduce DESMAN for De novo Extraction of Strains from MetAgeNomes. Metagenome sequencing generates short reads from throughout the genomes of a microbial community. Increasingly large, multi-sample metagenomes, stratified in space and time are being generated from communities with thousands of species. Repeats result in fragmentary co-assemblies with potentially millions of cont...
متن کاملGut metagenomes of type 2 diabetic patients have characteristic single-nucleotide polymorphism distribution in Bacteroides coprocola
BACKGROUND Gut microbes play a critical role in human health and disease, and researchers have begun to characterize their genomes, the so-called gut metagenome. Thus far, metagenomics studies have focused on genus- or species-level composition and microbial gene sets, while strain-level composition and single-nucleotide polymorphism (SNP) have been overlooked. The gut metagenomes of type 2 dia...
متن کاملPRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data
Low-cost DNA sequencing technologies have expanded the role for direct nucleic acid sequencing in the analysis of genomes, transcriptomes, and the metagenomes of whole ecosystems. Human and machine comprehension of such large datasets can be simplified via synthesis of sequence fragments into long, contiguous blocks of sequence (contigs), but most of the progress in the field of assembly has fo...
متن کاملAn integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography.
We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 32 17 شماره
صفحات -
تاریخ انتشار 2016